Do participant with recurrent BV have a composition similar to their incident BV composition?

Author

Laura Symul

Code
# We load the MAE object and retrieve the clinical data
mae <- load_latest_mae(str_c(data_dir(), "03_augmented_mae/"))
clin <- MultiAssayExperiment::colData(mae) %>% as.data.frame()
Code
target_visits <- c(4, 7)

Methods

To assess whether participants with recurrent BV (or lack of Lactobacillus-dominance) have a microbiota composition similar to their initial BV composition, we primarily rely on the Bray-Curtis dissimilarity between the pre-MTZ microbiota composition expressed as ASV relative abundances (the lowest possible resolution given 16S data) and the corresponding composition at the target visit (the week 24 visit as primary target visit and the week 12 visit for a sensitivity analysis).

Participants were included in this analysis if they were diagnosed with BV or who had less than 50% Lactobacillus at the target visit and microbiota composition data was availble at the pre-MTZ visit. Analyses will distinguish between participants with or without BV at the target visit to assess whether conclusions differ based on BV diagnosis, using the same criteria for BV diagnosis as in Cohen et al, 2020.

Because some participants with diverse microbiota may have a composition similar to other participants with diverse microbiota, we will not only evaluate the similarity to their own microbiota, but also compare it to the similarity with the microbiota of other participants to compute a percentile rank. Someone with a small percentile rank is more similar to themselves than to everyone else pre-MTZ. Some participants could be relatively similar to themselves but be more similar to other participants if several participants were enrolled with similarly diverse microbiota.

From the dissimilarity matrix \(D\) where \(d_{ij}\) is the dissimilarity between participants \(i\) pre-MTZ and participant \(j\) at the target visit, we can also construct the minimum spanning tree (MST) to visualize the relationships between participants and count the number of “pure edges” (i.e., edges connecting the two visits (pre-MTZ and target visits) of the same participant). We can then test whether the observed number of pure edges is significantly higher than expected by chance using a graph permutation test.

We will evaluate whether these different metrics differ by arm to understand if the LBP (Lactin-V) disrupts the microbiota of “non-respondent” participants more than the Placebo does.

The analyses are grouped in a function recurrent_BV_composition_analysis() that takes the MAE object and the target visit as input and returns a list of results, including the dissimilarity to self and others, the MST, and the graph permutation test results.

Code
recurrent_BV_composition_analysis
## function (mae, target_visit) 
## {
##     target_visit <- as.numeric(match.arg(as.character(target_visit), 
##         choices = as.character(target_visits), several.ok = FALSE))
##     df <- select(left_join(full_join(distinct(select(filter(as_tibble(MultiAssayExperiment::colData(mae)), 
##         has_V0, AVISITN == target_visit), USUBJID, BV)), mutate(select(mutate(filter(get_assay_wide_format(mae, 
##         "prop_Lacto"), AVISITN == target_visit), prop_Lacto_target = assay$prop_Lacto), 
##         USUBJID, prop_Lacto_target), mBV = prop_Lacto_target < 
##         0.5), by = join_by(USUBJID)), dplyr::rename(distinct(select(as_tibble(MultiAssayExperiment::colData(mae)), 
##         USUBJID, ARM)), Arm = ARM), by = join_by(USUBJID)), USUBJID, 
##         Arm, everything())
##     df <- filter(df, (BV == "Yes") | mBV)
##     outcome_table <- gt::gt(pivot_wider(mutate(dplyr::count(df, 
##         BV, mBV), BV = str_replace_na(BV, "Unknown"), mBV = ifelse(mBV, 
##         "< 50% Lactobacillus", "≥ 50% Lactobacillus")), names_from = mBV, 
##         values_from = n, values_fill = 0), caption = "Number of participants included in the analysis by BV diagnosis and Lactobacillus dominance at the week 24 visit.")
##     ASV_prop <- select(filter(get_assay_wide_format(mae, "ASV_16S_filtered_p"), 
##         USUBJID %in% df$USUBJID, AVISITN %in% c(0, target_visit)), 
##         USUBJID, AVISITN, assay)
##     ASV_target <- select(mutate(filter(ASV_prop, AVISITN == target_visit), 
##         ASV_p_target = set_rownames(as.matrix(assay), USUBJID)), 
##         USUBJID, ASV_p_target)
##     ASV_pre_MTZ <- select(mutate(filter(ASV_prop, AVISITN == 
##         0), ASV_p_pre_MTZ = set_rownames(as.matrix(assay), USUBJID)), 
##         USUBJID, ASV_p_pre_MTZ)
##     df <- left_join(left_join(select(df, -any_of(c("ASV_p_target", 
##         "ASV_p_pre_MTZ"))), ASV_target, by = "USUBJID"), ASV_pre_MTZ, 
##         by = "USUBJID")
##     rm(ASV_prop, ASV_target, ASV_pre_MTZ)
##     df <- mutate(df, BC_pre_MTZ_target = 1 - 2 * rowSums(pmin(df$ASV_p_target, 
##         df$ASV_p_pre_MTZ, na.rm = TRUE))/rowSums(df$ASV_p_target + 
##         df$ASV_p_pre_MTZ))
##     BC_all_others <- ungroup(mutate(arrange(group_by(mutate(arrange(mutate(pivot_longer(rownames_to_column(as.data.frame(ValenciaR::BC(df$ASV_p_target, 
##         drop_na(as.data.frame(df$ASV_p_pre_MTZ)))), "USUBJID"), 
##         cols = -USUBJID, names_to = "pre_MTZ_USUBJID", values_to = "BC"), 
##         same_USUBJID = USUBJID == pre_MTZ_USUBJID, same_USUBJID_label = fct_rev(fct_infreq(ifelse(same_USUBJID, 
##             "Same participant", "Other participants")))), !same_USUBJID, 
##         BC), USUBJID = fct_inorder(USUBJID), order_BC = as.integer(USUBJID)), 
##         USUBJID), USUBJID, BC), has_own_BC = any(same_USUBJID), 
##         BC_mean = mean(BC, na.rm = TRUE), BC_sd = sd(BC, na.rm = TRUE), 
##         z_score = (BC - BC_mean)/BC_sd, percentile_rank = ecdf(BC)(BC) * 
##             100))
##     df <- left_join(select(df, -any_of("percentile_rank")), select(filter(BC_all_others, 
##         same_USUBJID), USUBJID, percentile_rank), by = join_by(USUBJID))
##     BC_all_others <- mutate(arrange(ungroup(mutate(group_by(mutate(BC_all_others, 
##         is_smallest_distance = same_USUBJID & (BC == min(BC[same_USUBJID], 
##             na.rm = TRUE)), is_smallest_percentile_rank = same_USUBJID & 
##             (percentile_rank == min(percentile_rank[same_USUBJID], 
##                 na.rm = TRUE)), is_largest_distance_within_smallest_percentile_rank = is_smallest_percentile_rank & 
##             (BC == max(BC[is_smallest_percentile_rank], na.rm = TRUE)), 
##         combo = 100 * (1 - BC) + percentile_rank, is_smallest_combo = same_USUBJID & 
##             (combo == min(combo[same_USUBJID], na.rm = TRUE)), 
##         is_largest_distance = same_USUBJID & (BC == max(BC[same_USUBJID], 
##             na.rm = TRUE)), is_largest_percentile_rank = same_USUBJID & 
##             (percentile_rank == max(percentile_rank[same_USUBJID], 
##                 na.rm = TRUE))), USUBJID), note = case_when(any(is_smallest_distance) ~ 
##         "smallest distance", any(is_largest_distance_within_smallest_percentile_rank) ~ 
##         "smallest percentile rank with largest BC", any(is_smallest_combo) ~ 
##         "smallest combo", any(is_largest_distance) ~ "largest distance", 
##         any(is_largest_percentile_rank) ~ "largest percentile rank", 
##         TRUE ~ ""))), order_BC), note = fct_relevel(fct_inorder(factor(note)), 
##         "", after = Inf), pid_label = str_replace(LETTERS[as.integer(note)], 
##         LETTERS[nlevels(note)], ""))
##     df <- mutate(left_join(left_join(select(df, -any_of(c("prop_Lacto_pre_MTZ", 
##         "prop_Lacto_post_MTZ", "delta_prop_Lacto"))), select(mutate(filter(get_assay_wide_format(mae, 
##         "prop_Lacto"), AVISITN == 0), prop_Lacto_pre_MTZ = assay$prop_Lacto), 
##         USUBJID, prop_Lacto_pre_MTZ), by = join_by(USUBJID)), 
##         select(mutate(filter(get_assay_wide_format(mae, "prop_Lacto"), 
##             AVISITN == 1), prop_Lacto_post_MTZ = assay$prop_Lacto), 
##             USUBJID, prop_Lacto_post_MTZ), by = join_by(USUBJID)), 
##         delta_prop_Lacto = prop_Lacto_post_MTZ - prop_Lacto_pre_MTZ)
##     df <- left_join(select(df, -any_of(c("prop_Lacto_max"))), 
##         summarize(group_by(arrange(select(mutate(filter(get_assay_wide_format(mae, 
##             "prop_Lacto"), AVISITN != 0, AVISITN < target_visit), 
##             prop_Lacto = assay$prop_Lacto), USUBJID, AVISITN, 
##             prop_Lacto), USUBJID, AVISITN), USUBJID), prop_Lacto_max = max(prop_Lacto, 
##             na.rm = TRUE), .groups = "drop"), by = join_by(USUBJID))
##     df <- left_join(select(df, -any_of(c("largest_BC_by_w8"))), 
##         summarize(group_by(mutate(left_join(select(df, USUBJID, 
##             ASV_p_pre_MTZ), arrange(select(mutate(filter(get_assay_wide_format(mae, 
##             "ASV_16S_filtered_p"), AVISITN != 0, AVISITN < 4), 
##             ASV_p_other = as.matrix(assay[, colnames(df$ASV_p_pre_MTZ)])), 
##             USUBJID, AVISITN, ASV_p_other), USUBJID, AVISITN), 
##             by = join_by(USUBJID)), BC = 1 - 2 * rowSums(pmin(ASV_p_pre_MTZ, 
##             ASV_p_other, na.rm = TRUE))/rowSums(ASV_p_pre_MTZ + 
##             ASV_p_other)), USUBJID), largest_BC_by_w8 = ifelse(any(!is.na(BC)), 
##             max(BC, na.rm = TRUE), NA), .groups = "drop"), by = join_by(USUBJID))
##     df <- left_join(select(df, -any_of(c("BC_pre_post", "BC_pre_post_abs"))), 
##         select(mutate(mutate(left_join(left_join(select(df, USUBJID, 
##             ASV_p_pre_MTZ), arrange(select(mutate(filter(get_assay_wide_format(mae, 
##             "ASV_16S_filtered_p"), AVISITN == 1), ASV_p_post_MTZ = as.matrix(assay[, 
##             colnames(df$ASV_p_pre_MTZ)])), USUBJID, AVISITN, 
##             ASV_p_post_MTZ), USUBJID, AVISITN), by = join_by(USUBJID)), 
##             select(mutate(distinct(filter(select(as_tibble(mae@colData), 
##                 USUBJID, AVISITN, LOAD), AVISITN == 1)), post_MTZ_load = replace_na(LOAD, 
##                 median(LOAD, na.rm = TRUE)), pre_MTZ_load = median(mae$LOAD[mae$AVISITN != 
##                 1], na.rm = TRUE)), USUBJID, post_MTZ_load, pre_MTZ_load), 
##             by = join_by(USUBJID)), ASV_pre_MTZ_abs = ASV_p_pre_MTZ * 
##             pre_MTZ_load, ASV_post_MTZ_abs = ASV_p_post_MTZ * 
##             post_MTZ_load), BC_pre_post_abs = 1 - 2 * rowSums(pmin(ASV_pre_MTZ_abs, 
##             ASV_pre_MTZ_abs, na.rm = TRUE))/rowSums(ASV_pre_MTZ_abs + 
##             ASV_pre_MTZ_abs), BC_pre_post = 1 - 2 * rowSums(pmin(ASV_p_pre_MTZ, 
##             ASV_p_post_MTZ, na.rm = TRUE))/rowSums(ASV_p_pre_MTZ + 
##             ASV_p_post_MTZ)), USUBJID, BC_pre_post_abs, BC_pre_post), 
##         by = join_by(USUBJID))
##     ASV_V0Vt <- mutate(filter(get_assay_wide_format(mae, "ASV_16S_filtered"), 
##         USUBJID %in% df$USUBJID, AVISITN %in% c(0, target_visit)), 
##         ID = paste(USUBJID, AVISITN, sep = "_"), )
##     BC_distances <- vegan::vegdist(ASV_V0Vt$assay %>% as.matrix() %>% 
##         set_rownames(ASV_V0Vt$ID), method = "bray")
##     ps_ASV_V0Vt <- phyloseq::phyloseq(otu_table(ASV_V0Vt$assay, 
##         taxa_are_rows = FALSE), sample_data(ASV_V0Vt %>% select(-assay)))
##     gt <- graph_perm_test(ps_ASV_V0Vt, sampletype = "USUBJID", 
##         distance = "(A+B-2*J)/(A+B)", type = "mst", nperm = 1000)
##     list(mae = mae, target_visit = target_visit, df = df, outcome_table = outcome_table, 
##         BC_all_others = BC_all_others, ASV_V0Vt = ASV_V0Vt, BC_distances = BC_distances, 
##         gt = gt)
## }

The function make_figures_recurrent_BV_composition() generates figures to visualize the results.

Code
make_figures_recurrent_BV_composition
## function (res) 
## {
##     BC_distributions <- make_distribution_figures(df = res$df, 
##         var = "BC_pre_MTZ_target", target_visit = res$target_visit)
##     PR_distributions <- make_distribution_figures(df = res$df, 
##         var = "percentile_rank", target_visit = res$target_visit)
##     BC_and_PR <- BC_and_PR_each_participant(BC_all_others = res$BC_all_others, 
##         df = res$df, target_visit = res$target_visit)
##     examples <- show_examples(res = res)
##     BC_and_PR_by_history <- BC_and_PR_by_history(df = res$df, 
##         target_visit = res$target_visit)
##     MST <- plot_MST(distances = res$BC_distances, sample_data = res$ASV_V0Vt) + 
##         ggtitle("MST based on Bray-Curtis distance")
##     main_figure <- free(BC_and_PR[[1]]) + free(BC_and_PR[[2]] + 
##         guides(shape = "none")) + free(examples) + BC_distributions$g_by_arm_and_BV + 
##         PR_distributions$g_by_arm_and_BV + plot_layout(heights = c(2, 
##         1), widths = c(3, 2, 2, 2), design = "\n        ABCC\n        ABDE\n      ") + 
##         plot_annotation(tag_levels = "A")
##     list(BC_distributions = BC_distributions, PR_distributions = PR_distributions, 
##         BC_and_PR = BC_and_PR, examples = examples, BC_and_PR_by_history = BC_and_PR_by_history, 
##         MST = MST, main_figure = main_figure)
## }

Results

Analysis based on week 24 BV and composition

Code
res_week_24 <- recurrent_BV_composition_analysis(mae, target_visit = 7)
figs_week_24 <- make_figures_recurrent_BV_composition(res_week_24)
Code
res_week_24$outcome_table
Number of participants included in the analysis by BV diagnosis and Lactobacillus dominance at the week 24 visit.
BV < 50% Lactobacillus
Yes 40
No 41
Unknown 2

Roughly half of participants who had <50% Lactobacillus at the week 24 visit were diagnosed with BV at that visit. All participants who were diagnosed with BV at the week 24 visit had <50% Lactobacillus at that visit.

Code
figs_week_24$MST + plot_permutations(res_week_24$gt) + 
  plot_annotation(
    tag_levels = "A"
    ) & 
  theme(plot.tag = element_text(size = 16, face = "bold")) 

(A) Minimum Spanning Tree (MST) built on the BC-dissimilarity matrix between microbiota composition as expressed in terms of ASV relative abundances at the pre-MTZ visit and the week 24 visit. Edges in black connect samples from the same participant, defining ‘pure edges’. (B): Distribution of the number of ‘pure’ edges (connecting samples from the same participant) under 1000 graph permutations. The red line indicates the observed number of such edges.
Code
figs_week_24$main_figure

(A) Bray-Curtis dissimilarity (x-axis) between the week 24 and the pre-MTZ microbiota composition (expressed as ASV relative abundances) for each participant (y-axis) differentiating between dissimilarity with self (large red dots) and with others (small gray dots). Participants who received a BV diagnosis at week 24 are shown with a closed red dot, while those without BV at week 24 by an open red dot. Participants are ordered by increasing dissimilarity to self. (B) Percentile rank (x-axis) of the dissimilarity to self among dissimilarities with self and others for each participant (y-axis), colored by the arm they were randomized to. As in (A), participants who received a BV diagnosis at week 24 are shown with a closed dot, and those without by an open dot. Participants are ordered as in A. (C) Examples of microbiota compositions pre-MTZ and at week 24 for a selection of participants denoted with letters ranging from A to E and identified by these letters in panels (A-B). Participant A returns to a microbiota composition highly similar to their pre-MTZ composition. Participants B and C are more similar to themselves than to others, but are much less similar to themselves than participant A, especially participant C. Participants D and E are highly dissimilar to themselves pre-MTZ, but for participant D, that appears to be explained by a particularly uncommon microbiota pre-MTZ, while E has a particularly uncommon microbiota at week 24. (D) Distribution of Bray-Curtis dissimilarities to self by arm (horizontal panels) and by BV diagnosis at week 24 (colors). (E) Distribution of percentile ranks by arm (horizontal panels) and by BV diagnosis at week 24 (colors).
Code
make_summary_stat_table(res = res_week_24)
Number and percentage of participant by similarity category.
cat All LACTIN-V Placebo
Most similar to self (min perc. rank) 18 (22%) [14% - 33%] 10 (22%) [11% - 37%] 8 (23%) [11% - 41%]
More similar to self than to others (> min. but < 50th perc. rank) 45 (56%) [44% - 66%] 22 (48%) [33% - 63%] 23 (66%) [48% - 80%]
More similar to others than to self (>= 50th perc. rank) 18 (22%) [14% - 33%] 14 (30%) [18% - 46%] 4 (11%) [4% - 28%]
Code
make_summary_stat_table_v2(res = res_week_24)
Number and percentage of participant by similarity category.
cat All LACTIN-V Placebo
More similar to self than to others (< 50th perc. rank) 63 (78%) [67% - 86%] 32 (70%) [54% - 82%] 31 (89%) [72% - 96%]
More similar to others than to self (>= 50th perc. rank) 18 (22%) [14% - 33%] 14 (30%) [18% - 46%] 4 (11%) [4% - 28%]
Code
figs_week_24$BC_distributions$patch + plot_annotation(tag_levels = "A")

Additional visualizations depicting the distributions of Bray-Curtis dissimilarities to self (top) and their associated percentile ranks (bottom).
Code
figs_week_24$PR_distributions$patch + plot_annotation(tag_levels = "A")

Additional visualizations depicting the distributions of Bray-Curtis dissimilarities to self (top) and their associated percentile ranks (bottom).
Code
figs_week_24$BC_and_PR_by_history + plot_annotation(tag_levels = "A")

(A-D) Bray-Curtis dissimilarity to self (y-axis) by (A) post-MTZ total Lactobacillus relative abundance; (B) difference in total Lactobacillus relative abundance between the post-MTZ and pre-MTZ visits; (C) maximum total Lactobacillus relative abundance post-MTZ and until the target visit; (D) Bray-Curtis dissimilarity between pre-MTZ and post-MTZ visits. (E-H) same as (A-D) but for the percentile ranks.

Analysis based on week 12 BV and composition

Code
res_week_12 <- recurrent_BV_composition_analysis(mae, target_visit = 4)
figs_week_12 <- make_figures_recurrent_BV_composition(res_week_12)
Code
res_week_12$outcome_table
Number of participants included in the analysis by BV diagnosis and Lactobacillus dominance at the week 24 visit.
BV ≥ 50% Lactobacillus < 50% Lactobacillus
Yes 2 40
No 0 41
Unknown 0 2

Roughly half of participants who had <50% Lactobacillus at the week 12 visit were diagnosed with BV at that visit. 2 participants who were diagnosed with BV at the week 12 visit had ≥50% Lactobacillus at that visit.

Code
figs_week_12$MST + plot_permutations(res_week_12$gt)

(A) Minimum Spanning Tree (MST) built on the BC-dissimilarity matrix between microbiota composition as expressed in terms of ASV relative abundances at the pre-MTZ visit and the week 12 visit. Edges in black connect samples from the same participant, defining ‘pure edges’. (B): Distribution of the number of ‘pure’ edges (connecting samples from the same participant) under 1000 graph permutations. The red line indicates the observed number of such edges.
Code
figs_week_12$main_figure

(A) Bray-Curtis dissimilarity (x-axis) between the week 12 and the pre-MTZ microbiota composition (expressed as ASV relative abundances) for each participant (y-axis) differentiating between dissimilarity with self (large red dots) and with others (small gray dots). Participants who received a BV diagnosis at week 12 are shown with a closed red dot, while those without BV at week 12 by an open red dot. Participants are ordered by increasing dissimilarity to self. (B) Percentile rank (x-axis) of the dissimilarity to self among dissimilarities with self and others for each participant (y-axis), colored by the arm they were randomized to. As in (A), participants who received a BV diagnosis at week 12 are shown with a closed dot, and those without by an open dot. Participants are ordered as in A. (C) Examples of microbiota compositions pre-MTZ and at week 12 for a selection of participants denoted with letters ranging from A to E and identified by these letters in panels (A-B). Participant A returns to a microbiota composition highly similar to their pre-MTZ composition. Participants B and C are more similar to themselves than to others, but are much less similar to themselves than participant A, especially participant C. Participants D and E are highly dissimilar to themselves pre-MTZ, but for participant D, that appears to be explained by a particularly uncommon microbiota pre-MTZ, while E has a particularly uncommon microbiota at week 12. (D) Distribution of Bray-Curtis dissimilarities to self by arm (horizontal panels) and by BV diagnosis at week 12 (colors). (E) Distribution of percentile ranks by arm (horizontal panels) and by BV diagnosis at week 12 (colors).
Code
make_summary_stat_table(res = res_week_12)
Number and percentage of participant by similarity category.
cat All LACTIN-V Placebo
Most similar to self (min perc. rank) 21 (25%) [17% - 36%] 14 (26%) [15% - 40%] 7 (24%) [11% - 44%]
More similar to self than to others (> min. but < 50th perc. rank) 49 (59%) [48% - 70%] 29 (54%) [40% - 67%] 20 (69%) [49% - 84%]
More similar to others than to self (>= 50th perc. rank) 13 (16%) [9% - 26%] 11 (20%) [11% - 34%] 2 (7%) [1% - 24%]
Code
make_summary_stat_table_v2(res = res_week_12)
Number and percentage of participant by similarity category.
cat All LACTIN-V Placebo
More similar to self than to others (< 50th perc. rank) 70 (84%) [74% - 91%] 43 (80%) [66% - 89%] 27 (93%) [76% - 99%]
More similar to others than to self (>= 50th perc. rank) 13 (16%) [9% - 26%] 11 (20%) [11% - 34%] 2 (7%) [1% - 24%]
Code
figs_week_12$BC_distributions$patch
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_ydensity()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`position_quasirandom()`).

Code
figs_week_12$PR_distributions$patch
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_ydensity()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`position_quasirandom()`).

Code
figs_week_12$BC_and_PR_by_history

(A-D) Bray-Curtis dissimilarity to self (y-axis) by (A) post-MTZ total Lactobacillus relative abundance; (B) difference in total Lactobacillus relative abundance between the post-MTZ and pre-MTZ visits; (C) maximum total Lactobacillus relative abundance post-MTZ and until the target visit; (D) Bray-Curtis dissimilarity between pre-MTZ and post-MTZ visits. (E-H) same as (A-D) but for the percentile ranks.

Conclusions

78% of participants who did not colonized with Lactobacillus end up with a microbiota more similar to their initial BV microbiota than to the microbiota of other participants, suggesting that the antibiotics and study products failed to significantly disrupt their microbiota. That fraction is higher in the Placebo arm (89%) than in the Lactin-V arm (70%) although confidence intervals overlap: 72% - 96% (Placebo) vs 54% - 82% (Lactin-V), and there was no evidence that participants who experienced a larger MTZ-induced shifts in their microbiota composition would be less likely to return to their initial BV microbiota.